Robinia: Scalable Framework for Data-intensive Scientific Computing on Wide Area Network

نویسندگان

  • YANG GU
  • GUOQING LI
  • QUAN ZOU
  • ZHENCHUN HUANG
چکیده

With the continuously growing data from scientific devices and models, data exploration becomes one of four kinds of scientific research paradigms. It leads to faster, larger-scale and more complex processing requirements, and parallelism is being more and more important for scientific data analyzing applications. But, because of troubles such as unstable wide-area network and heterogeneity among computing platforms, it is difficult to create scalable parallel scientific applications, especially wide-area parallel applications which have to process big data from geographically distributed research institutes to enable complex data analysis for ”great challenge problems”. In this paper, a data intensive computing framework named Robinia is proposed for exploiting parallelism among processing nodes over wide area network for data-intensive analysis on scientific big data. Robinia integrates distributed resources such as scientific data, processing algorithms, and storage services by a platform-independent framework; provides a unified execution environment for wide-area network based distributed spatial applications; and helps them exploit parallelism by a well-defined web-based programming interface. Experiments on prototype system and demo applications show that scientific analysis applications based on Robinia can achieve higher performance and better scalability by analyzing distributive stored big data over wide-area network such as Internet simultaneously.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

Biomolecular committor probability calculation enabled by processing in network storage

Computationally complex and data intensive atomic scale biomolecular simulation is enabled via processing in network storage (PINS): a novel distributed system framework to overcome bandwidth, compute, storage, organizational, and security challenges inherent to the wide-area computation and storage grid. PINS is presented as an effective and scalable scientific simulation framework to meet the...

متن کامل

Scalable Bulk Data Transfer in Wide Area Networks

Bulk data transfer in wide area networks (WAN) requires scalable and high network bandwidth. In this paper, we identify a number of the scalability limitations that affect the full utilization of peak theoretical network bandwidth. In addition, we study and classify different offered approaches to overcome some of the identified limitations and increase network bandwidth among Grid components i...

متن کامل

Simulation of Terabit Data Flows for Exascale Applications

Scientific workflows are increasingly drawing attention as both data and compute resources are getting bigger, heterogeneous, and distributed. Many science workflows are both compute and data intensive and use distributed resources. This situation poses significant challenges in terms of real-time remote analysis and dissemination of massive datasets to scientists across the community. These ch...

متن کامل

A New Framework for Increasing the Sustainability of Infrastructure Measurement of Smart Grid

Advanced Metering Infrastructure (AMI) is one of the most significant applications of the Smart Grid. It is used to measure, collect, and analyze data on power consumption.  In the AMI network, the smart meters traffics are aggregated in the intermediate aggregators and forwarded to the Meter Data Management System (MDMS). The infrastructure used in this network should be reliable, real-time an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014